c 
r. 

a 



\Uc\oo 



A 



PTO/SB/05 (08-00) 

- Piease type a plus sign (+) inside this box fc> (Tl „*„ Approvedfor use through 10/31/2002. OMB 0651-0032 

2j ~ ^ U.S. Patent and Trademark Office, U.S. DEPARTMENT OF COMMERCE 

C Under the Paperwork Reduction Act of 1995, no persons are required to respond to a collection of information unless it displays a valid OMB control number. 



r 



UTILITY 
PATENT APPLICATION 
TRANSMITTAL 

V (Only for new nonpmvisiona^ 



Attorney Docket No. 


P3816 ^ 


First Inventor 


Mario Nemirovsky et al. 


Title 


Fetch and Dispatch Decoupling Mechanism for 
Multistrearflimi Processors 


Express Mail Label No. 


EL573446764US j 



APPLICATION ELEMENTS 

See MPEP chapter 600 concerning utility patent application contents. 



I y I Fee Transmittal Form (e.g., PTO/SB/1 7) 

LtlJ (Submit an original and a duplicate for fee processing) 

I \t I Applicant claims small entity status. 
L ^ J See 37 CFR 1.27. 

E Specification [Total Pages flTl 

(preferred arrangement set forth below) 

- Descriptive title of the invention 

- Cross Reference to Related Applications 

- Statement Regarding Fed sponsored R&D 

- Reference to sequence listing, a table, 
or a computer program listing appendix 

- Background of the Invention 

- Brief Summary of the invention 

- Brief Description of the Drawings {if filed) 

- Detailed Description 

- Claim(s) 

- Abstract of the Disclosure 



.to 



ADDRESS TO: 



Assistant Commissioner for Patents 
Box Patent Application 
Washingto n, DC 20231 



7. CD-ROM or CD-R in duplicate, large table or 
Computer Program {Appendix) 

8. Nucleotide and/or Amino Acid Sequence Submission 
{if appl icable, all necessary) 

a. | j Computer Readable Form (CRF) 

b. Specification Sequence Listing on: 

i. □ CD-ROM or CD-R (2 copies); or 
i i.D paper 
Statements verifying identity of above copies 



4. [§ Drawing(s) (35 U.S.C. 113) 



[ Total Sheets HT1 ] 
[ Total Pages HT1 ] 



a. 
b 



5. Oath or Declaration 

Newly executed (original or copy) 

□ Copy from a prior application (37 CFR 1.63 (d)) 
(for continuation/ divisional with Box 17 completed) 

i. □ DELETION OF INVENTORY 

Signed statement attached deleting mventor(s) 
named in the prior application, see 37 CFR 
1 63(d)(2) and 1 33(b). 

Application Data Sheet. See 37 CFR 1.76 



□ 



ACCOMPANYING APPLICATION PARTS 



37 CFR 3 73(b) Statement 
(when there is an assignee) 



Power of 
Attorney 



9. [X] Assignment Papers (cover sheet & document(s)) 

□ 
□ 
□ 
□ 

□ 



English Translation Document (if applicable) 
Information Disclosure j j Copies of IDS 



Statement (IDS)/PTO-1449 
Preliminary Amendment 

Return Receipt Postcard (MPEP 503) 
(Should be specifically itemized) 

Certified Copy of Priority Documents) 
(if foreign priority is claimed) 

Other: .Checkfor fees 



Citations 



17. If a CONTINUING APPLICATION, check appropriate box, and supply the requisite Information below and in a preliminary amendment 
or in an Application Data Sheet under 37 CFR 1. 76- 

□ Continuation □ Divisional Q Continuation-in-part (CIP) of prior application No / 

Prior application mformation Exammer HA Group I Art Umt J^A 

For CONTINUATION OR DIVISIONAL APPS only: The entire disclosure of the prior application, from which an oath or declaration is supplied under 
Box 5b, is considered a part of the disclosure of the accompanying continuation or divisional application and is hereby incorporated by reference. 
The incorporation earsonly be relied upon when a portion has been inadvertently omitted from the submitted application parts. 

18. CORRESPONDENCE ADDRESS 



I X I Customer Number or Bar Code Label 



24739 



Name 



Address 



City 




Correspondence address below 



24739 



PATENT .TRADEMARK OFFICE 



State 



Zip Code 



Name (Print/Type) 




Registration No. (Attorneyi 'Agent) 


35,074 


L Signature 




Date 11/03/2000 , 



ml 

U 



the amount of time you are required to complete this form should be sent to the Chief Information Officer, U.S. Patent and Trademark Office Washmqton DC 
20231. DO NOT SEND FEES OR COMPLETED FORMS TO THIS ADDRESS. SEND TO. Assistant Commissioner for Patents, Box Patent Application 
Washington, DC 20231. 



Certificate of Express Mailing 



•'Express Mail" Mailing Label Number: EL573446764US 

Date of Deposit: 11/03/2000 

Ref: Case Docket No.: P3816 

First Named Inventor: Mario Nemirovsky et al. 

Serial Number: NA 

Filing Date: 11/03/2000 

Title of Case: Fetch and Dispatch Decoupling Mechanism for 
Multistreaming Processors 

I hereby certify that the attached papers are being deposited with the United States Postal 
Service "Express Mail Post Office to Addressee" service under 37 C.F.R. 1 . 10 on the date 
indicated above and addressed to the Commissioner of Patents and Trademarks, 
Washington D.C. 20231 



1 . Utility patent application transmittal. 

2. 12 sheets of specification. 

3 . 3 sheets of drawings. 

4. Fee transmittal. 

5 . Duplicate fee transmittal. 

6 . Declaration and Power of Attorney. 

7. Verified Statement Claiming Small Entity. 

8 . Recordation Cover Sheet. 

9. Assignment. 

10. Check for fees in the amount of $395.00. 

1 1 . Certificate of express mailing. 

12. Postcard listing contents. 



Mark A. Boys 
(Typed or printed name of person mailing paper or fee) 

(SigABfKfre of person mailing papers orree) 



PTO/SB/17 (09-00) 
Approved for use through 10/31/2002. OMB 0651-0032 
llnHflf , . . D . , A , ,, ftftc U.S. Patent and Trademark Office, U.S. DEPARTMENT OF COMMERCE 

Under the Paperwork Reduction Act of 1995. no persons are mcm.red to resnnnri to a collection of i nformation unless it disnlavs a vaiid OMR control m.mhp r 



TRANSMITTAL 
for FY 2001 

Patent fees are subject to annua! revision. 



TOTAL AMOUNT OF PAYMENT 



($) 395.00 



Complete if Known 



Application Number 



Filing Date 



First Named Inventor 



Examiner Name 



Group Art Unit 



Attorney Docket No. 



NA 



11/03/2000 



Mario Nemirovsky et al. 



NA 



NA 



P3816 



METHOD OF PAYMENT 



FEE CALCULATION 



(continued) 



j— The Commissioner is hereby authorized to charge 



indicated fees and credit any overpayments to. 
Deposit 
Account 
Number 

Deposit 
Account 
Name 



3. ADDITIONAL FEES 

Large EntitySmall Entity 
Fee Fee Fee Fee 
Code ($) Code {$) 



Fee Description 



Fee Paid 



53 
12 



Charge Any Additional Fee Required 
Under 37 CFR 1 16 and 1 17 

Applicant claims small entity status 
See 37 CFR 1 27 



2- ® Payment Enclosed: 

SI Check ["I Credit card Hi Mone y 
— ' — ' ' — ' Order 



□ 



Other 



FEE CALCULATION 



1. BASIC FILING FEE 

Large EntitySmall Entity 
Fee Fee Fee Fee Fee Description 
Code ($) 



Code {$) 
101 710 

106 320 

107 490 

108 710 
114 150 



Fee Paid 



201 355 Utility filing fee 

206 160 Design filing fee 

207 245 Plant filing fee 

208 355 Reissue filing fee 
214 75 Provisional filing fee 



355.00 



SUBTOTAL (1) ($) 355.00 



2. EXTRA CLAIM FEES 

Extra Claims 



Total Claims 
independent [ 
Claims L 



-20* 1 
- 3** 



IT 



Fee from 

below Fee Paid 

= l ooF ~ 

=l OOP" 



40 



Multiple Dependent 



Large Entity Small Entity 
Fee Fee Fee Fee 
Code ($} Code ($) 

103 18 203 9 



102 80 
104 270 
109 80 



202 40 
204 135 
209 40 



110 18 210 



Fee Description 

Claims in excess of 20 

Independent claims in excess of 3 

Multiple dependent claim, if not paid 

** Reissue independent claims 
over original patent 

** Reissue claims in excess of 20 
and over original patent 



SUBTOTAL (2) 



($) 0.00 



1 05 


130 


205 


65 


Surcharge - late filing fee or oath 


127 


50 


227 


25 


Surcharge - late provisional filing fee or 
cover sheet 


139 


130 


139 


130 


Non-English specification 


147 


2,520 


147 2,520 


For filing a request for ex parte reexamination 


112 


920* 


112 


920* 


Requesting publication of SIR prior to 
Examiner action 


113 


1,840* 


113 


1,840* Requesting publication of SIR after 
Examiner action 


115 


110 


215 


55 


Extension for reply within first month 


116 


390 


216 


195 


Extension for reply within second month 


117 


890 


217 


445 


Extension for reply within third month 


118 


1 ,390 


218 


695 


Extension for reply within fourth month 


128 


1,890 


228 


945 


Extension for reply within fifth month 


119 


310 


219 


155 


Notice of Appeal 


120 


310 


220 


155 


Filing a brief in support of an appeal 


121 


270 


221 


135 


Request for oral hearing 


138 


1,510 


138 1,510 


Petition to institute a public use proceeding 


140 


110 


240 


55 


Petition to revive - unavoidable 


141 


1,240 


241 


620 


Petition to revive - unintentional 


142 


1,240 


242 


620 


Utility issue fee (or reissue) 


143 


440 


243 


220 


Design issue fee 


144 


600 


244 


300 


Plant issue fee 


122 


130 


122 


130 


Petitions to the Commissioner 


123 


50 


123 


50 


Petitions related to provisional applications 


126 


240 


126 


240 


Submission of Information Disclosure Stmt 


581 


40 


581 


40 


Recording each patent assignment per 
property (times number of properties) 


146 


710 


246 


355 


Filing a submission after final rejection 
(37 CFR § 1.129(a)) 


149 


710 


249 


355 


For each additional invention to be 
examined (37 CFR § 1.129(b)) 


179 


710 


279 


355 


Request for Continued Examination (RCE) 


169 


900 


169 


900 


Request for expedited examination 
of a design application 


Other fee (specify) 







40 



**br number previously paid, if greater, For Reissues, see above 



Reduced by Basic Filing Fee Paid SUBTOTAL (3) ($) 



40.00 



SUBMITTED BY 



Complete^ applicable) 



Name (Print/Type) 




Registration No, 
(AttopteytAgent) 



| 35,074 



Telephone 



(831) 726-1457 



Signature 



Date 



11/03/2000 



WARNING: Information on this form may become public. Credit card information should not 
be included on this form. Provide credit card information and authorization on PTO-2038. 

Burden Hour Statement: This form is estimated to take 0.2 hours to complete. Time wili vary depending upon the needs of the individual case Any comments on 
l a rnnlS e rffr re required to complete this form should be sent to the Chief Information Officer, U.S. Patent and Trademark Office Washinqton DC 
20231. DO NOT SEND FEES OR COMPLETED FORMS TO THIS ADDRESS. SEND TO: Assistant Commissioner for Patents, Washing to^ DC 20231 



LP%± m a ^!!lf» M ^ I"- co=a No-2537 P. 5/9 



PTOfcB/fO (IMS) 
^ Approved uee through «V31gft. OM8 095^-0031 

Patent sntf Tracfernftrk Office; u.s, department of commerce 
Under jhfl Paperwork Retiucli&n Ad of 1SB5, no persons are required to respond to e collection of Infonnatfoti unlflftfl it diipleys « viUd QMS control number. 



VERIFIED STATEMENT CLAIMING SMALL ENTITY STATUS 
(37 CFR 1.9(f) & 1 ,27(c))^SM ALL BUSINESS CONCERN 



Docket Number (Optional) 
P3816 



AoDlicarttorPatentee: Mario Nerairovsky et ai 

Application orPatentNo,; NA , 

RtedoTlssued: NA „ 

Title: Fetch and Dispatch Decoupling Mechanism for Multistrearning Process ors 

I h*raby decide that I am 
Q th* otynar of the small business concern idenflfied betaw: 

CSJ an Qfffcte! of the small by sfnees concern empowered to act on behalf of (he concern Identified below: 

mameqfsmaijlbusinhss concern XStream Logic, ha 



ADDRESSQFSlJALLBU9INEB3CONChKN 7g0 Wvwfty Ave, St 270 



I hereby declare that trie ebove Identified emaH business cancarn qualfftes a& a small business concern as defined 
in 13 CFR 1 21 .1 2, and reproduced «n 37 CFR 1 .9(d), for purposes of payinrj reduced fees to the United States Pitsnt end 
Trademark Offlw, in Ihet Ihe number of employees of tho oonoom, Including fhofc* of ite afflHate*, rfrtta nnt AWMAd 500 
persona, For purposes of this statement, (1) the number of employees of the business concorn Is the average over the 
prevlousfiscet year of the concern of the persona employed on a faJMima, part-time, or temporary basis dilfffig each of lha 
p$y periods of the fiscal year, and (2) concerns are affiliates of each other when either, directly or indirectly, one concern 
controls or has the power to control the other, or a third perry or parites controls or has the power to control both. 

I hereby declare foat fights under contract or few have bean conveyed to and remain wtth the small business concern 
identified above with regard to (he Invention d*aorJbed In; 

G3 the spedftsetlon filed herewith with title as fisted •bow. 
O theappllcatlon Identified above, 
d the patent Identified abw« t 

If (he rlphte hftld by the above Identified small business concern are not exclusive, each individual, concern, or 
organization having rlghte trt the Invention must file separate verified etelamentfi averring to their statue as small entities, 
and noughts to the invention em held by any person, tftherthen the Inventor, who would not quality as an independent inventor 
under 07 crR 1 -3(c) ff that poraon rnodc thpinvontton, orby any corn*m which lwuld not<|i wffiy ft* Aam»ll huaineaa concern 
under 37 CFR 1 .9(d), or a nonprofit organization Undar 37 CFR 1 .3(e), 

£ach person, concern, or organization having any rights In the invention is tfeted below: 
l£J no such person, concern, or organization exists. 
P aach such person, concern, or organization is listed below, 



Separate verified statements are required from each named person, concern or organisation having fights to the 
invention averring to their status as small entitee. (37 CFR 1 ,27) ' 

I acknowledge the duty to file, In this application or patent, notification of any change In status rasutfnd In lots of 
entitlement to small entity statue prior to paying, or at the time of paying, the earliest of the issue fee or any maintenance 
fee ddft after the date on which status as a smell entity 1$ no longer appropriate. (37 CFR 1 .28(b)) 

i hereby declare that all statements made herein of my own knov^edtfe are trua and that all statements made on 
information and belief are believed to be true; and farther that these statements were made With the Knowledge mat wfllrul 
false statements and the like so made ars punishable by fine or imprisonment, or both, under section 1001 of Title 1$ of 
the United States Code, end thatsuch willful false statement 
thereon, or any patani to which this verified atetement l« directed, 



NAMt OF PERSON SIGNING Dan O'Neill 



TfTLE OF PERSON IF OTHER THAN OWNER 



CEO 



ADDRESS Of person aiQNiNO Z 50 UjndvsreityAw., Los Gatos, CA 95032 



SIGNATURE 




_ DATE ^ " 1 



SSEI^lppn trie nBBdTofttSTlu 
wSEKn: nr ° U NUi * tN ° pEES ° R COMPLET ^ FORMS TO THIS ABoftesg 'seflg TO: A&SMi 



Fetch and Dispatch Decoupling Mechanism for 
Multistreaming Processors 

by inventors 
Mario Nemirovsky, Adolfo Nemirovsky, 
Narendra Sankar and Enrique Musoll, 



Field of the Invention 

The present invention is in the field of digital processing and pertains 
more particularly to apparatus and methods for fetching and dispatching 
instructions in dynamic multistreaming processors. 

Background of the Invention 

Conventional pipelined single-stream processors incorporate fetch 
and dispatch pipeline stages, as is true of most conventional processors. In 
such processors, in the fetch stage, one or more instructions are read from an 
instruction cache and in the dispatch stage, one or more instructions are sent 
to execution units (EUs) to execute. These stages may be separated by one 
or more other stages, for example a decode stage. In such a processor the 
fetch and dispatch stages are coupled together such that the fetch stage 
generally fetches from the instruction stream in every cycle. 

In multistreaming processors known to the present inventors, 
multiple instruction streams are provided, each having access to the 
execution units. Multiple fetch stages may be provided, one for each 
instruction stream, although one dispatch stage is employed. Thus, the fetch 
and dispatch stages are coupled to one another as in other conventional 
processors, and each instruction stream generally fetches instructions in each 



cycle. That is, if there are five instruction streams, each of the five fetches in 
each cycle, and there needs to be a port to the instruction cache for each 
stream, or a separate cache for each stream. 

In a multistreaming processor multiple instruction streams share a 
common set of resources, for example execution units and/or access to 
memory resources. In such a processor, for example, there may be M 
instruction streams that share Q execution units in any given cycle. This 
means that a set of up to Q instructions is chosen from the M instruction 
streams to be delivered to the execution units in each cycle. In the following 
cycle a different set of up to Q instructions is chosen, and so forth. More 
than one instruction may be chosen from the same instruction stream, up to a 
maximum P, given that there are no dependencies between the instructions. 

It is desirable in multistreaming processors to maximize the number 
of instructions executed in each cycle. This means that the set of up to Q 
instructions that is chosen in each cycle should be as close to Q as possible. 
Reasons that there may not be Q instructions available include flow 
dependencies, stalls due to memory operations, stalls due to branches, and 
instruction fetch latency. 

What is clearly needed in the art is an apparatus and method to de- 
couple dispatch operations from fetch operations. The present invention, in 
several embodiments described in enabling detail below, provides a unique 
solution. 
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Summary of the Invention 

In a preferred embodiment of the present invention a pipelined 
5 multistreaming processor is provided, comprising an instruction source, a 

plurality of streams fetching instructions from the instruction source, a 
dispatch stage for selecting and dispatching instructions to a set of execution 
units, a set of instruction queues having one queue associated with each 
stream in the plurality of streams, and located in the pipeline between the 
1 0 instruction source and the dispatch stage, and a select system for selecting 

'% streams in each cycle to fetch instructions from the instruction source. The 

: I processor is characterized in that the number of streams selected for which 

vfl to fetch instructions in each cycle is fewer than the number of streams in the 

i * 

: n plurality of streams. 



15 ?* In some embodiments the number of streams in the plurality of 

j * streams is eight, and the number of streams selected for which to fetch 

\ i instructions in each cycle is two. Also in some embodiments the select 

[11 system monitors a set of fetch program counters (FPC) having one FPC 

Q associated with each stream, and directs fetching of instructions beginning at 

20 addresses according to the program counters. In still other embodiments 



each stream selected to fetch is directed to fetch eight instructions from the 
instruction cache. 

In some embodiments there is a set of execution units to which the 
dispatch stage dispatches instructions. In some embodiments the set of 
25 execution units comprises eight Arithmetic-Logic Units (ALS), and two 

memory units. 

In another aspect of the invention, in a pipelined multistreaming 
processor having an instruction queue, a method for decoupling fetching 



from a dispatch stage is provided, comprising the steps of (a) placing a set of 
instruction queues, one for each stream, in the pipeline between the 
instruction queue and the dispatch stage; and (b) selecting one or more 
streams, fewer than the number of streams in the multistreaming processor, 
for which to fetch instructions in each cycle from an instruction source. 

In some embodiments of the method the number of streams in the 
plurality of streams is eight, and the number of streams selected for which to 
fetch instructions in each cycle is two. In some embodiments the select 
system monitors a set of fetch program counters (FPC) having one FPC 
associated with each stream, and directs fetching if instructions beginning at 
addresses according to the to the program counters. In other embodiments 
each stream selected to fetch is directed to fetch eight instructions from the 
instruction source. In preferred embodiments, also, the dispatch stage 
dispatches instructions to a set of execution units, which may comprise eight 
Arithmetic-Logic Units (ALS), and two memory units. 

In embodiments of the present invention, described in enabling detail 
below, for the first time apparatus and methods are provided for a 
decoupling fetch and dispatch in processors, and particularly in 
multistreaming processors. 

Brief Description of the Drawings 

Fig. 1 is a block diagram depicting a pipelined structure for a 
processor in the prior art. 

Fig. 2 is a block diagram depicting a pipelined structure for a 
multistreaming processor known to the present inventors. 



Fig. 3 is a block diagram for a pipelines architecture for a 
multistreaming processor according to an embodiment of the present 
invention. 

Description of the Preferred Embodiments 

Fig. 1 is a block diagram depicting a pipelined structure for a 
processor in the prior art. In this prior art structure there is an instruction 
cache 1 1, wherein instructions await selection for execution, a fetch stage 13 
which selects and fetches instruction into the pipeline, and a dispatch stage 
which dispatches instructions to execution units (EUs) 17. In many 
conventional pipelined structures there are additional stages other than the 
exemplary stages illustrated here. 

In the simple architecture illustrated in Fig. 1 everything works in 
lockstep. In each cycle an instruction is fetched and, and another previously 
fetched instruction is dispatched to one of the execution units. 

Fig. 2 is a block diagram depicting a pipelined structure for a 
multistreaming processor known to the present inventors, wherein a single 
instruction cache 19 has ports for three separate streams, and a fetch is made 
per cycle by each of three fetch stages 21, 23 and 25 (one for each stream). 
In this particular case a single dispatch stage 27 selects instructions from a 
pool fed by the three streams and dispatches those instructions to one or 
another of three execution units 29. In this architecture the fetch and 
dispatch units are still directly coupled. It should be noted that the 
architecture of Fig. 2, while prior to the present invention, is not necessarily 
in the public domain, as it is an as-yet proprietary architecture known to the 



present inventors. In another example, there may be separate caches for 
separate streams, but this does not provide the desired de-coupling. 

Fig. 3 is a block diagram depicting an architecture for a dynamic 
multistreaming (DMS) processor according to an embodiment of the present 
invention. In this DMS processor there are eight streams and ten functional 
units. Instruction cache 3 1 in this embodiment has two ports for providing 
instructions to fetch stage 33. Eight instructions may be fetched each cycle 
for each port, so 16 instructions may be fetched per cycle. 

In a preferred embodiment of the present invention instruction 
queues 39 are provided, which effectively decouple fetch and dispatch stages 
in the pipeline. There are in this embodiment eight instruction queues, one 
for each stream. In the example of Fig. 3 the instruction queues are shown 
in a manner to illustrate that each queue may have a different number of 
instructions ready for transfer to a dispatch stage 41. 

Referring again to instruction cache 3 1 and the two ports to fetch 
stage 33, it was described above that eight instructions may be fetched to 
stage 33 via each port. Typically the eight instructions for one port are eight 
instructions from a single thread for a single stream. For example, the eight 
instructions fetched by one port in a particular cycle will typically be 
sequential instructions for a thread associated with one stream. 

Determination of the two threads associated with two streams to be 
accessed in each cycle is made by selection logic 35. Logic 35 monitors a 
set of fetch program counters 37, which maintain a program counter for each 
stream, indicating at what address to find the next instruction for that stream. 
Select logic 35 also monitors the state of each queue in set 39 of instruction 
queues. Based at least in part on the state of instruction queues 39 select 
logic 35 determines the two threads from which to fetch instructions in a 
particular cycle. For example, if the instruction queue in set 39 for a stream 
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is full, the probability of utilizing eight additional instructions into the 
pipeline from the thread associated with that stream is low. Conversely, if 
the instruction queue in set 39 for a stream is empty, the probability of 
utilizing eight additional instructions into the pipeline from the thread 
5 associated with that stream is high. 

In this embodiment, in each cycle, four instructions are made 
available to dispatch stage 41 from each instruction queue. In practice 
dispatch logic is provided for selecting from which queues to dispatch 
instructions. The dispatch logic has knowledge of many parameters, 
10 typically including priorities, instruction dependencies, and the like, and is 

'">:* also aware of the number of instructions in each queue. 

N As described above, there are in this preferred embodiment ten 

\f\ execution units, which include two memory units 43 and eight arithmetic 

J v% logic units (ALUs) 45. Thus, in each cycle up to ten instructions may be 

1 5^ dispatched to execution units. 

In the system depicted by Fig. 3 the unique and novel set of 
instruction queues 39 provides decoupling of dispatch from fetch in the 
pipeline. The dispatch stage now has a larger pool of instructions from 
Q which to select to dispatch to execution units, and the efficiency of dispatch 

20 is improved. That is the number of instructions that may be dispatched per 

cycle is maximized. This structure and operation allows a large number of 
streams of a DMS processor to execute instructions continually while 
permitting the fetch mechanism to fetch from a smaller number of streams in 
each cycle. Fetching from a smaller number of streams, in this case two, in 
25 each cycle is important, because the hardware and logic necessary to provide 

additional ports into the instruction cache is significant. As an added benefit, 
unified access to a single cache is provided. 



Thus the instruction queue in the preferred embodiment allows 
fetched instructions to be buffered after fetch and before dispatch. The 
instruction queue read mechanism allows the head of the queue to be 
presented to dispatch in each cycle, allowing a variable number of 
instructions to be dispatched from each stream in each cycle. With the 
instruction queue, one can take advantage of instruction stream locality, 
while maximizing the efficiency of the fetch mechanism in the presence of 
stalls and branches. By providing a fetch mechanism that can support up to 
eight instructions from two streams, one can keep the instruction queues full 
while not having to replicate the fetch bandwidth across all streams. 

The skilled artisan will recognize that there are a number of 
alterations that might be made in embodiments of the invention described 
above without departing from the spirit and scope of the invention. For 
example, the number of instruction queues may vary, the number of ports 
into the instruction cache may vary, the fetch logic may be implemented in a 
variety of ways, and the dispatch logic may be implemented in a variety of 
ways, among other changes that may be made within the spirit and scope of 
the invention. For these and other reasons the invention should be afforded 
the broadest scope, and should be limited only by the claims that follow. 



What is claimed is: 



1 . A pipelined multistreaming processor, comprising: 

an instruction source; 

a plurality of streams fetching instructions from the instruction 

source; 

a dispatch stage for selecting and dispatching instructions to a set of 
execution units; 

a set of instruction queues having one queue associated with each 
stream in the plurality of streams, and located in the pipeline between the 
instruction source and the dispatch stage; and 

a select system for selecting streams in each cycle to fetch 
instructions from the instruction source; 

characterized in that the number of streams selected for which to 
fetch instructions in each cycle is fewer than the number of streams in the 
plurality of streams. 

2. The processor of claim 1 wherein the number of streams in the plurality 
of streams is eight, and the number of streams selected for which to fetch 
instructions in each cycle is two. 

3. The processor of claim 2 wherein the select system monitors a set of 
fetch program counters (FPC) having one FPC associated with each stream, 
and directs fetching if instructions beginning at addresses according to the to 
the program counters. 

4. The processor of claim 2 wherein each stream selected to fetch is directed 
to fetch eight instructions from the instruction cache. 
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5. The processor of claim 1 further comprising a set of execution units to 
which the dispatch stage dispatches instructions. 

6. The processor of claim 5 wherein the set of execution units comprises 
eight Arithmetic-Logic Units (ALUs), and two memory units. 

7. In a pipelined multistreaming processor having an instruction queue, a 
method for decoupling fetching from a dispatch stage, comprising the steps 
of: 

(a) placing a set of instruction queues, one for each stream, in the 
pipeline between the instruction queue and the dispatch stage; and 

(b) selecting one or more streams, fewer than the number of streams 
in the multistreaming processor, for which to fetch instructions in each cycle 
from an instruction source. 

8. The method of claim 7 wherein the number of streams in the plurality of 
streams is eight, and the number of streams selected for which to fetch 
instructions in each cycle is two. 

9. The method of claim 8 wherein the select system monitors a set of fetch 
program counters (FPC) having one FPC associated with each stream, and 
directs fetching if instructions beginning at addresses according to the to the 
program counters. 

10. The method of claim 7 wherein each stream selected to fetch is directed 
to fetch eight instructions from the instruction source. 



-11- 



1 1 . The method of claim 6 wherein the dispatch stage dispatches 
instructions to a set of execution units. 

12. The method of claim 1 1 wherein the set of execution units comprises 
eight Arithmetic-Logic Units (ALS), and two memory ports. 
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Abstract of the Disclosure 

A pipelined multistreaming processor has an instruction source, a 
plurality of streams fetching instructions from the instruction source, a 
dispatch stage for selecting and dispatching instructions to a set of execution 
units, a set of instruction queues having one queue associated with each 
stream in the plurality of streams, and located in the pipeline between the 
instruction cache and the dispatch stage, and a select system for selecting 
streams in each cycle to fetch instructions from the instruction cache. The 
processor is characterized in that the select system selects one or more 
streams in each cycle for which to fetch instructions from the instruction 
cache, and in that the number of streams selected for which to fetch 
instructions in each cycle is fewer than the number of streams in the plurality 
of streams. 
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